COS 511 : Theoretical Machine Learning
نویسنده
چکیده
In the last few lectures, we talked about various kinds of online learning problems. We started with the simplest case in which there is a perfect expert, who is always right. In the last lecture, we generalize this into the case in which we might not have a “perfect expert”, but instead we have a panel or subcommittee of experts, whose combined prediction is always right. We also introduced the perceptron algorithm to find the correct combination of experts. The perceptron algorithm is an example of a weight-update algorithm, which has the general framework as follows: Initialize w1 for t = 1, 2, · · · , T predict ŷt = sign(wt · xt) update wt+1 = F (wt,xt, yt). In general, the updating rule can be dependent on the whole observation history, but we’ll only consider those F that have the form F (wt,xt, yt), i.e. those updating rules that only use information of the previous round. The perceptron algorithm uses the initialization and updating rules as follows: Initialize w1 = 0 update: if ŷt 6= yt, where ŷt = sign(wt · xt) wt+1 = wt + ytxt else wt+1 = wt. In the last lecture, we have proved that under assumptions:
منابع مشابه
Theoretical Machine Learning Cos 511 Lecture #9
In this lecture we consider a fundamental property of learning theory: it is amenable to boosting. Roughly speaking, boosting refers to the process of taking a set of rough “rules of thumb” and combining them into a more accurate predictor. Consider for example the problem of Optical Character Recognition (OCR) in its simplest form: given a set of bitmap images depicting hand-written postal-cod...
متن کاملCOS 511 : Theoretical Machine Learning
In other words, if ≤ 1/8 and δ ≤ 1/8, then PAC learning is not possible with fewer than d/2 examples. The outline of the proof is: To prove that there exists a concept c ∈ C and a distribution D, we are going to construct a fixed distribution D, but we do not know the exact target concept c used. Instead, we will choose c at random. If we get an expected probability of error over c, then there ...
متن کاملCOS 511 : Theoretical Machine Learning
Suppose we are given examples x1, x2 . . . , xm drawn from a probability distribution D over some discrete space X. In the end, our goal is to estimate D by finding a model which fits the data, but is not too complex. As a first step, we need to be able to measure the quality of our model. This is where we introduce the notion of maximum likelihood. To motivate this notion suppose D is distribu...
متن کاملCOS 511 : Theoretical Machine Learning
as the price relative which is how much a stock goes up or down in a single day. St denotes the amount of wealth we have at the start of day t and we assume S1 = 1. We denote wt(i) to be the fraction of our wealth that we have in stock i at the beginning of day t which can be viewed as a probability distribution as ∀i, wt(i) ≥ 0 and ∑ iwt(i) = 1. We can then derive the total wealth in stock i a...
متن کاملCOS 511 : Theoretical Machine Learning
Last class, we discussed an analogue for Occam’s Razor for infinite hypothesis spaces that, in conjunction with VC-dimension, reduced the problem of finding a good PAClearning algorithm to the problem of computing the VC-dimension of a given hypothesis space. Recall that VC-dimesion is defined using the notion of a shattered set, i.e. a subset S of the domain such that ΠH(S) = 2 |S|. In this le...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008